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ADDITIVE REPRESENTATIONS FOR 
TWO-DIMENSIONAL TABLES 



Roger Pennell 

Educational Testing Service 
Abstract 

Given that we collect observations A^ from two perfectly crossed 
factors we may be interested in fitting a model such as tik^P^) a i + 6 j 
An iterative method for computing the scale values and 3^ and the 

function f is developed. The procedure is relevant to problems of finding 
monotonic transformations eliminating interaction effects preceding analysis 
of variance and to the classical conjoint measurement model. 



ADDITIVE REPRESENTATIONS FOR 



TWO-DIMENSIONAL TABLES 
Roger Pennell 

Educational Testing Service 
I . INTRODUCTION 

When conjoint measurement was finally exposited in its complete or 
nearly complete form by Luce and lukey in 1964, it was heralded by many 
as the panacea for measurement problems in the behavioral sciences. If 
not a panacea, certainly the forerunner of an axiomatized, complete measure- 
ment system. Indeed certain refinements, simplifications and generalizations 
followed (Scott, 1964; Krantz, 1964; Roskies, 1965), but the utilization of 
CM as a tool for research in the behavioral sciences didn't. The reasons 
for this lack of enthusiasm by the researcher in the field are complex, but 
probably include the following: (l) only the most sophisticated of readers 

could wade through the myriad of involved axioms and theorems; (2) the how to 
do it part of the model was by andlarge lacking. It is the second diffi- 
culty to which this paper hopes to make a modest contribution. 

CM, as expounded by Luce and Tukey, assumes two sets of "events" or 
factors , A and P and a weak ordering"'" ( <_ ) on A x P , where A x P 
is the cartesian product of A and P . This is most easily understood 
in an analysis of variance context where we have two perfectly crossed 
factors A_^ and P^ where i = 1,2, . . . ,n , j = 1,2, . . . ,m , m,n >_ 2 , 
and we make observations, or record responses, on each of the cells in the 
design A^P^ . We can then generate a weak ordering by lining up the cells. 
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the A.P , in a nonincreasing sequence. Given this, and certain other 
1 J 

conditions discussed below, the axioms assert that there exists three 
functions <J> * 0 , and A such that 



(i) 



<t(A x P) = 0(A) + A(P) 



That is, there exist functions which will transform the observations, or 
responses, and the sets of events, A and P , such that we generate an 
additive model. In the analysis of variance context, we have found a 
transformation which permits us to disregard the interaction term and speak 
of a completely additive model of main effects. 

Besides the axiom concerning a weak ordering certain other conditions 
must be met for (l) to be obtainable: (a) the levels of the factors must 

be sufficiently finely graded such that for any A^P^ and A^ for i ^ k 
there exists a P^ such that A^P^ = 5 (*>) given a set of observations 

A x P it is necessary that the data conform to certain, essentially transi- 

tivity requirements. Formally, given A^,A^,A^ and P^P^P^ , 

A . P > A,P and A,P n > A, P implies A.P n > A. P , and (c) an archimedean 

lirt— jn j i - Km l £ — k n 

2 

axiom for ordered sets. 

In practice the weak ordering conditions are easy to verify on an empir- 
ical set of data; however, the additional conditions enumerated above are 
not routinely verifiable. Even if we are able to apply the axioms to a 
finite set of data and deduce that the data do not conform to the axioms, 
we cannot conclude that an additive representation does not exist, since 
the axioms of CM are merely sufficient, not necessary. Zinnes (19&9) 
points out another difficulty: we, quite naturally, can't expect the axioms 

to hold exactly, thus how close do they have to be before we accept them as 



4 



- 3 - 



being satisfied? There is as yet no statistical theory for determining 
the goodness of fit of the data to the axioms. 

This paper will outline a straightforward approach to achieving CM 
largely in the context of analysis of variance. The approach may fail for 
some particular data set in hand; however, from the results decisions can 
be made as to the utility of the additive representation obtained. 

In analysis of variance transformations of the data are almost never 
for reasons other than achieving homogeneity of variance. Finding a trans- 
formation that will yield some model as a strictly additive function of the 
main effects and an error term, thus eliminating an interaction effect, is 
rarely considered, even though the existence of such an interaction effect 
may simply be a result of the scale upon which the factors were measured. In 
any case a transformation is considered admissible only if it is monotonic, 
thus preserving the ordering of the cell means. We shall deal with the most 
general class of monotone functions for achieving an additive representation. 

Shepard (1962) was the first to explicitly state the notion of nonmetric 
monotonicity as a criterion for admissible transformations of observed data. 
That is, a transformation is considered admissible only if the ordinal proper- 
ties of the original data are maintained after transformation. Certainly 
in the context of analysis of variance, it is inconceivable to consider 
transformations which may invert the order of the cell means (Winer, 1962). 
Kruskal ( 19614a, 19614b) implemented Shepard's original notions into a 
powerful algorithm for resolving a set of data into its dimensional com- 
ponents. The crux of Kruskal 's program is the generation of a monotone 
regression of reproduced data on original data. At each step in the program 
the data points, in a space of given dimensionality, are altered slightly 
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to maximize this regression. Since only the ordinal characteristics of 
the original data are of interest (more appropriately, the ordinal rela- 
tions are the maximal information obtainable), we can't strictly talk of 
maximizing a regression (that is, doing arithmetic on ordinal numbers). 

Thus, Kruskal essentially discards the original data while preserving only 

N 

its rank ordering. Symbolically given a sequence of data D = {d^K_^ , 
let a be a permutation of the first N integers such that 
d 0 ^^ — ^ 0 ( 2 ) — *'* — ^°(n) ‘ ^ unc "* :; '- on > ° s is the only characteris- 

tic of the data retained. The monotone regression problem is then solved 
by applying the same function to the reproduced data and seeking a parti- 
tion which renders it also nonincreasing. For details of this procedure 
see Kruskal (1964b). 

II . METHOD 

We shall use essentially the same approach in finding a transformation, 

f(A.P.) , which is exactly monotonically related to the original A.P . 

J i j 

Explicitly we shall try to fit a model of the form 



(2) 



f (A.P ) = a. + 0, + e., 
i j i J ij 



To do this we shall use a measure similar to Kruskal f s stress, S 



(3) 



S = [ 



n 

I 



m 

Z 



n m 

e? / I I (a. + 0 ) 2 ] 
i=l j=l 1J i=l j=l 1 J 



1/2 



Minimizing (3) is equivalent to finding a set of a. and 0. which repro- 

J 

duce, in an additive fashion, a transformation of the original data. We 

m 

shall assume, without loss of generality, that Z $ = 0 . Taking the 

j=i J 



o 

ERIC 



C 



-5- 



partial derivative of (3) with respect to the and , and noting 

that minimizing S is equivalent to minimizing S 2 we obtain 



(4a) 



aq2 m 

= mKa - K E f(A.P.) - mLa = 0 
3a v » i i J v 

v j=l 



(4b) 



rf" = (L = K)[ E a. + n$. ] - L E f(A P ) = 0 , 

j6 ft i»-'l 1 1 i=l 1 J 



n m 



n m , 

where K = E E ci , and L - E E (a + $,) 2 
i=l j=l ^ i=l t'l 1 J 



2 t. = f E (a. + 3 4 ) 2 • Since equation (4) 

i=l j-1 

is a rather complicated quadratic we choose to use a gradient method 



(Kunz, 1957) in order to minimize 3S 2 /3a v and 3S 2 /36^ . That is to say, 

given a set of initial approximations to start the process, say and 

3 , the estimates of the final a. and 3. after the k -th iteration 

j 0 1 J 

will be 



(5) 



Q ik a ik-l ^ 3a_. 



3S Z 



ik-1 



6 JR ' V-l - * to? 



ik-1 



As an aside we can note that heuristically one can picture (5) as 
hunting in a space of nm dimensions (the parameter space of a^ s and 
6. s) for a point providing a minimum for S . We can consider the point 
having coordinates , i = l,2,...,n , Bjq > j = 1,2,... 5 m as lying 

on a hypersurface of constant . By evaluating (5) we move in a direction 

perpendicular to the hypersurface, inward towards the point S . We move 
inward until we just graze another hypersurface of constant, say S\ , 
reevaluate (4) and again move inward toward S . Ultimately the process 
should converge to a minimum for S . 
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Wc need to estimate two quantities in order to proceed. First, it 
would be helpful to hnve a reasonable first approximation to the final 

We should, be able to obtain 3uch an approx- 
imation by assuming the (A^.P^) arc additive and letting and B : 



solution for the and Bj 



i J 

correspond to the lea3t squares estimates 



JO 



(6a) 



l n 

= - E (A.P. ) - C 
10 r. i J 



m 



(6b) 



6 in = “ 2 (A.P.) 

JO m i=1 i j 



where C represents the average a effect. 

The second quantity we need is X , commonly known os the step size. 

In our case it is rather simple to compute if we observe that for any iter- 
ation, k + 1 , we are trying to find a X which makes 



n m 

(7) 2 2 ej 

i=l j=l 1J 



n m 
E E 
i=l j=l 

KS = T 




3S' 



3a 



+ 3, 



ik 




- f(A.Pj )) 2 



as small as possiole. Thus, the needed value of a is given by dT/dX = 0 , 
or, dropping the iteration subscripts and letting = 3S 2 /3a^ and 
Qj = 3S 2 /33 j , 

(8) X = 



n nm m nm nm nm 

m E a.V.+ E E a.Q +n E 3 Q + E E 3.V.- E E f(A.P )V.- E E f(A.P )Q. 

i=l 1 1 i =1.1=1 1 J ,1=1 J J i=1.1=l J 1 i =1.1=1 1 J 1 i=1.1=l 1 J 0 

n n m m 

m E V? + 2 E E Q V. + n E Q 2 

i=l 1 i=l j=l J 1 j=l J 
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In summary, given a set of data to be measured conjointly we (l) string 

the data out in a vector (for instance, take each column, one at a time, and 

string it into a single vector) and find the function o , (2) find the 

initial approximations a iQ and , (3) string the numbers a iQ + , 

i = l,2,...,n , J = l,2,...,m , into similarly arranged vector as in step 1 

and apply the function o , (h) find the function f , (5) solve equations 

(3), (M and (8) and obtain improved estimates of a. and 6, and repeat 

^ d 

starting from step 3- 



III. EXAMPLES 

In order to illustrate the above outlined algorithm two examples will 
be presented. The first is some data taken from Winer (1962, p. 2h5). These 
data are supposed to represent a 2 x 2 analysis of variance with two obser- 
vations per cell. The sums of the observations are presented in Table 1. 

Insert Table 1 about here 

The F -test for the interaction effect is significant at better than 
p = ,01 and the associated sum of squares is 950.56. Winer conjectured 
that a square-root transformation would remove the sum of squares (SS) due 
to interaction, and, in fact, such a transformation reduced it to .30. 

Table 2 presents the same data after they have been transformed by a computer 
program designed to carry out CM . 

Insert Table 2 about here 
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In approximately one second of 360/65 CPU time the program was termi- 

-6 

nated after 10 iterations and S = .7 x 10 . Sums of squares for mter- 

_7 

action was less than . ^9 x 10 . Values of as well as the final scale 

values are also presented. We should expect S = 0.0 for data in which j 

\ 

none of the profiles of plotted cell means cross between effects. These 
data were probably concocted for illustrative purposes by Winer in order 
to show that there often exist transformations on the original scale of 
measurement which render an essentially additive model. The computer pro- 
gram* however, made no assumptions about the form of the transformation 
except that it be monotonic, and recovered an essentially square-root 
transformation (linearly transformed) such that the plot of the cell means 
looks almost exactly like Winer’s (p. 2 U 7 ) except that the SS for inter- 
action is zero to five decimal places. 

The second example represents some data collected by Leibowitz and 
Bourne (1956) attempting to explore the conditions under which either 
retinal image or shape constancy obtain. They varied the degree of lumi- 
nance and the duration of exposure obtaining the data presented in Table 3. 

Insert Table 3 about here 

The data indicate that as luminance or exposure is increased shape constancy 
tends to obtain, and conversely under minimal viewing conditions (near 
threshold) retinal image tends to dominate. 
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The data were input to the CM Program and produced the results in 
Table U. The solution was obtained after an arbitrary 20 iterations with 

Insert Table U about here 

-2 • 

S = .311 x 10 and sum of squares for interaction equal to .719 x 10 

If we force the data to admit an additive structure it can be seen 
that the experimental design is highly redundant in that luminances of .1 
and 1.0 and exposures of .01 and .05, and .5 and .75 produce highly similar 
perceptions. Further, we have found a monotone transformation of the values 
expressing duration of exposure and amount of luminance which yields a very 
close additive model. 

IV. DISCUSSION 

After completing the development of the model and producing a 
computer program to perform CM it came to the author's attention that 
a very similar, although slightly more sophisticated, approach had been 
devised in a book by Roskam (1968). His approach proceeds by a direct 
minimization of an equation similar to (3) and appears to produce results 
similar to those reported here except for those cases extremely degenerate 
in form. Young (1969) has also reported an algorithm for doing polynomial 
CM in N -space which is a generalization (using a different algorithm) of 
the results reported here. 

The generalization of our approach to N -dimensional scale .values is 
straightforward, though certainly coupled with some risk. In practice we 
surely would not let the total number of scale values exceed mn , and 
certainly should explore the tenability of a one-dimensional fit first. 
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Footnotes 



^"Formally, for <_ to be a weak ordering the following must obtain for 
q , r and sCAxPj (l) q q holds for all q > (2) Q. r anc ^- 
r >_ s => q .> s , (3) either q >. r or r _> q or both. 

2 Briefly, an archimedean axiom generally requires that for arbitrary 
A.P and A P„ there exists an integer n such that nA.P > \P { . 

ijk£ ijkx. 
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Table 1 

Siam of Two Observations in a Two-way 
Analysis of Variance 
(from Winer, 1962, p. 245 ) 





b 


b_ 


b~ 




1 


2 


3 


a l 


1.0 


26.0 


47.0 


a 2 


18.0 


62.0 


95.0 


a 3 


64.0 


134.0 


196.0 



SS 



AxB 



= 950.56 
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Table 2 

Data from Table 1 after Conjoint Measurement, 
Scale Values, and Values of S During Iteration 



Iteration 

Number 



S 



. 129x10 



Scale 

Values 


b l 


b 2 


b 3 


27.658 


75.251 


111.424 


a 1 -48.037 


-20.378 


27.214 


63.387 


a 2 -11.863 


15.795 


63.387 


99.561 


a 3 59-900 


87-559 


135.151 


171.324 


2 3 


4 5 


6 


7 8 


-1 .l44xl0~ 2 


. 16x10 


-3 


-4 

178x10 



. 432x10 2 . 481x10 3 . 533xlO~^ .59x10 



-5 



9 



.2x10 




10 



5 

•7xlO“ 6 
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Table 3 

Mean Ratios of Major Axis Length to Minor 
Axis Length of Elipses Matched to a 
Standard (.5) at Varying Levels of 
Duration and Luminance (Leibovitz 
and Bourne (1956), p. 278) 



Exposure 
(sec. ) 


Luminance 


(Millilamberts ) 


.01 


.1 


1.0 


.01 


.k 86 


.52k 


• 515 


.05 


.503 


.528 


.517 


.10 


.522 


.566 


• 570 


.25 


.5M» 


.608 


.692 


• 50 


• 570 


.688 


.802 


• 75 


• 575 


.670 


• 790 


1.00 


• 590 


.737 


.8b2 
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Table 4 

Data from Table 3 after 
Conjoint Measurement 



Scale 

Values 


.556 


. 636 


.642 


-.113 


.443 


.526 


.526 


-.113 


.443 


.526 


.526 


-.030 


.526 


.6 07 


.613 


.052 


.607 


.688 


.698 


.064 


.620 


.698 


.708 


.064 


.620 


.698 


.708 


.075 


.631 


.708 


.718 
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